Program Access to the VME Bus

Program Access to the VME Bus

Your program accesses the devices on the VME bus in one of two ways, through programmed I/O (PIO) or through DMA. Normally, VME cards with Bus Master capabilities always use DMA, while VME cards with slave capabilities are accessed using PIO.

The Challenge/Onyx architecture also contains a unique hardware feature, the DMA Engine, which can be used to move data directly between memory and a slave VME device.

PIO Access

You perform PIO to VME devices by mapping the devices into memory using the mmap() function (The use of PIO is covered in greater detail in the IRIX Device Driver Programmer's Guide. Memory mapping of I/O devices and other objects is covered in the book Topics in IRIX Programming.)

Each PIO read requires two transfers over the POWERpath-2 bus: one to send the address to be read, and one to retrieve the data. The latency of a single PIO input is approximately 4 microseconds. PIO write is somewhat faster, since the address and data are sent in one operation. Typical PIO performance is summarized in Table 9-3.

VME Bus PIO Bandwidth
Data Unit Size Read Write
D8 0.2 MB/second 0.75 MB/second
D16 0.5 MB/second 1.5 MB/second
D32 1 MB/second 3 MB/second

VME Bus PIO Bandwidth
Data Unit Size	Read	Write
D8	0.2 MB/second	0.75 MB/second
D16	0.5 MB/second	1.5 MB/second
D32	1 MB/second	3 MB/second

When a system has multiple VME busses, you can program concurrent PIO operations from different CPUs to different busses, effectively multiplying the bandwidth by the number of busses. It does not improve performance to program concurrent PIO to a single VME bus.

Tip: When transferring more than 32 bytes of data, you can obtain higher rates using the DMA Engine. See "DMA Engine Access to Slave Devices".

User-Level Interrupt Handling

When a VME device that you control with PIO can generate interrupts, you can arrange to trap the interrupts in your own program. In this way you can program the device for some lengthy operation using PIO output to its registers, and then wait until the device returns an interrupt to say the operation is complete.

The programming details on user-level interrupts are covered in the IRIX Device Driver Programmer's Guide.

DMA Access to Master Devices

VME bus cards with Bus Master capabilities transfer data using DMA. These transfers are controlled and executed by the circuitry on the VME card. The DMA transfers are directed by the address mapping described under "DMA Mapping".

DMA transfers from a Bus Master are always initiated by a kernel-level device driver. In order to exchange data with a VME Bus Master, you open the device and use read() and write() calls. The device driver sets up the address mapping and initiates the DMA transfers. The calling process is typically blocked until the transfer is complete and the device driver returns.

The typical performance of a single DMA transfer is summarized in Table 9-4. Many factors can affect the performance of DMA, including the characteristics of the device.

VME Bus Bandwidth, VME Master Controlling DMA
Data Transfer Size Reading Writing
D8 0.4 MB/sec 0.6 MB/sec
D16 0.8 MB/sec 1.3 MB/sec
D32 1.6 MB/sec 2.6 MB/sec
D32 BLOCK 22 MB/sec (256 byte block) 24 MB/sec (256 byte block)
D64 BLOCK 55 MB/sec (2048 byte block) 58 MB/sec (2048 byte block)

VME Bus Bandwidth, VME Master Controlling DMA
Data Transfer Size	Reading	Writing
D8	0.4 MB/sec	0.6 MB/sec
D16	0.8 MB/sec	1.3 MB/sec
D32	1.6 MB/sec	2.6 MB/sec
D32 BLOCK	22 MB/sec (256 byte block)	24 MB/sec (256 byte block)
D64 BLOCK	55 MB/sec (2048 byte block)	58 MB/sec (2048 byte block)

Up to 8 DMA streams can run concurrently on each VME bus. However, the aggregate data rate for any one VME bus will not exceed the values in Table 9-4.

DMA Engine Access to Slave Devices

A DMA engine is included as part of each POWER Channel-2. The DMA engine is unique to the Challenge/Onyx architecture. It performs efficient, block-mode, DMA transfers between system memory and VME bus slave cards--cards that would normally be capable of only PIO transfers.

The DMA engine greatly increases the rate of data transfer compared to PIO, provided that you transfer at least 32 contiguous bytes at a time. The DMA engine can perform D8, D16, D32, D32 Block, and D64 Block data transfers in the A16, A24, and A32 bus address spaces.

All DMA engine transfers are initiated by a special device driver. However, you do not access this driver through open/read/write system functions. Instead, you program it through a library of functions. The functions are documented in the udmalib(3x) reference page. They are used in the following sequence:

Call dma_open() to initialize action to a particular VME card.
Call dma_allocbuf() to allocate storage to use for DMA buffers.
Call dma_mkparms() to create a descriptor for an operation, including the buffer, the length, and the direction of transfer.
Call dma_start() to execute a transfer. This function does not return until the transfer is complete.

For more details of user DMA, see the IRIX Device Driver Programmer's Guide.

The typical performance of the DMA engine for D32 transfers is summarized in Table 9-5. Performance with D64 Block transfers is somewhat less than twice the rate shown in Table 9-5. Transfers for larger sizes are faster because the setup time is amortized over a greater number of bytes.

VME Bus Bandwidth, DMA Engine, D32 Transfer
Transfer Size Read Write Block Read Block Write
32 2.8 MB/sec 2.6 MB/sec 2.7 MB/sec 2.7 MB/sec
64 3.8 MB/sec 3.8 MB/sec 4.0 MB/sec 3.9 MB/sec
128 5.0 MB/sec 5.3 MB/sec 5.6 MB/sec 5.8 MB/sec
256 6.0 MB/sec 6.7 MB/sec 6.4 MB/sec 7.3 MB/sec
512 6.4 MB/sec 7.7 MB/sec 7.0 MB/sec 8.0 MB/sec
1024 6.8 MB/sec 8.0 MB/sec 7.5 MB/sec 8.8 MB/sec
2048 7.0 MB/sec 8.4 MB/sec 7.8 MB/sec 9.2 MB/sec
4096 7.1 MB/sec 8.7 MB/sec 7.9 MB/sec 9.4 MB/sec

VME Bus Bandwidth, DMA Engine, D32 Transfer
Transfer Size	Read	Write	Block Read	Block Write
32	2.8 MB/sec	2.6 MB/sec	2.7 MB/sec	2.7 MB/sec
64	3.8 MB/sec	3.8 MB/sec	4.0 MB/sec	3.9 MB/sec
128	5.0 MB/sec	5.3 MB/sec	5.6 MB/sec	5.8 MB/sec
256	6.0 MB/sec	6.7 MB/sec	6.4 MB/sec	7.3 MB/sec
512	6.4 MB/sec	7.7 MB/sec	7.0 MB/sec	8.0 MB/sec
1024	6.8 MB/sec	8.0 MB/sec	7.5 MB/sec	8.8 MB/sec
2048	7.0 MB/sec	8.4 MB/sec	7.8 MB/sec	9.2 MB/sec
4096	7.1 MB/sec	8.7 MB/sec	7.9 MB/sec	9.4 MB/sec

Some of the factors that affect the performance of user DMA include

The response time of the VME board to bus read and write requests
The size of the data block transferred (as shown in Table 9-5)
Overhead and delays in setting up each transfer

The numbers in Table 9-5 were achieved by a program that called dma_start() in a tight loop, in other words, with minimal overhead.

The dma_start() function operates in user space; it is not a kernel-level device driver. This has two important effects. First, overhead is reduced, since there are no mode switches between user and kernel, as there are for read() and write(). This is important since the DMA engine is often used for frequent, small inputs and outputs.

Second, dma_start() does not block the calling process, in the sense of suspending it and possibly allowing another process to use the CPU. However, it waits in a test loop, polling the hardware until the operation is complete. As you can infer from Table 9-5, typical transfer times range from 50 to 250 microseconds. You can calculate the approximate duration of a call to dma_start() based on the amount of data and the operational mode.

You can use the udmalib functions to access a VME Bus Master device, if the device can respond in slave mode. However, this would normally be less efficient than using the Master device's own DMA circuitry.

While you can initiate only one DMA engine transfer per bus, it is possible to program a DMA engine transfer from each bus in the system, concurrently.